With an estimated 19 million operations performed annually, cataract surgeryis the most common surgical procedure. This paper investigates the automaticmonitoring of tool usage during a cataract surgery, with potential applicationsin report generation, surgical training and real-time decision support. In thisstudy, tool usage is monitored in videos recorded through the surgicalmicroscope. Following state-of-the-art video analysis solutions, each frame ofthe video is analyzed by convolutional neural networks (CNNs) whose outputs arefed to recurrent neural networks (RNNs) in order to take temporal relationshipsbetween events into account. Novelty lies in the way those CNNs and RNNs aretrained. Computational complexity prevents the end-to-end training of "CNN+RNN"systems. Therefore, CNNs are usually trained first, independently from theRNNs. This approach is clearly suboptimal for surgical tool analysis: manytools are very similar to one another, but they can generally be differentiatedbased on past events. CNNs should be trained to extract the most useful visualfeatures in combination with the temporal context. A novel boosting strategy isproposed to achieve this goal: the CNN and RNN parts of the system aresimultaneously enriched by progressively adding weak classifiers (either CNNsor RNNs) trained to improve the overall classification accuracy. Experimentswere performed in a new dataset of 50 cataract surgery videos where the usageof 21 surgical tools was manually annotated. Very good classificationperformance are achieved in this dataset: tool usage could be labeled with anaverage area under the ROC curve of $A_z$ = 0.9717 in offline mode (using past,present and future information) and $A_z$ = 0.9696 in online mode (using pastand present information only).
展开▼